Goto

Collaborating Authors

 roc curve


Active Bipartite Ranking

Neural Information Processing Systems

V arious dedicated algorithms have been recently proposed and studied by the machine-learning community. In contrast, active bipartite ranking rule is poorly documented in the literature. Due to its global nature, a strategy for labeling sequentially data points that are difficult to rank w.r.t. to the others is


Active Bipartite Ranking

Neural Information Processing Systems

In this paper, we develop an active learning framework for the bipartite ranking problem.Motivated by numerous applications, ranging from supervised anomaly detection to credit-scoring through the design of medical diagnosis support systems, and usually formulated as the problem of optimizing (a scalar summary of) the ROC curve, bipartite ranking has been the subject of much attention in the passive context. Various dedicated algorithms have been recently proposed and studied by the machine-learning community. In contrast, active bipartite ranking rule is poorly documented in the literature. Due to its global nature, a strategy for labeling sequentially data points that are difficult to rank w.r.t. to the others is required. This learning task is much more complex than binary classification, for which many active algorithms have been designed. It is the goal of this article to provide a rigorous formulation of such a selective sampling approach.


Statistical NLP for Optimization of Clinical Trial Success Prediction in Pharmaceutical R&D

Doane, Michael R.

arXiv.org Artificial Intelligence

This work presents the development and evaluation of an NLP-enabled probabilistic classifier designed to estimate the probability of technical and regulatory success (pTRS) for clinical trials in the field of neuroscience. While pharmaceutical R&D is plagued by high attrition rates and enormous costs, particularly within neuroscience, where success rates are below 10%, timely identification of promising programs can streamline resource allocation and reduce financial risk. Leveraging data from the ClinicalTrials.gov database and success labels from the recently developed Clinical Trial Outcome dataset, the classifier extracts text-based clinical trial features using statistical NLP techniques. These features were integrated into several non-LLM frameworks (logistic regression, gradient boosting, and random forest) to generate calibrated probability scores. Model performance was assessed on a retrospective dataset of 101,145 completed clinical trials spanning 1976-2024, achieving an overall ROC-AUC of 0.64. An LLM-based predictive model was then built using BioBERT, a domain-specific language representation encoder. The BioBERT-based model achieved an overall ROC-AUC of 0.74 and a Brier Score of 0.185, indicating its predictions had, on average, 40% less squared error than would be observed using industry benchmarks. The BioBERT-based model also made trial outcome predictions that were superior to benchmark values 70% of the time overall. By integrating NLP-driven insights into drug development decision-making, this work aims to enhance strategic planning and optimize investment allocation in neuroscience programs.


MedImageInsight for Thoracic Cavity Health Classification from Chest X-rays

Boya, Rama Krishna, Magalanadu, Mohan Kireeti, Palavalli, Azaruddin, Tekuri, Rupa Ganesh, Pattanayak, Amrit, Enuga, Prasanthi, Muthu, Vignesh Esakki, Boya, Vivek Aditya

arXiv.org Artificial Intelligence

Chest radiography remains one of the most widely used imaging modalities for thoracic diagnosis, yet increasing imaging volumes and radiologist workload continue to challenge timely interpretation. In this work, we investigate the use of MedImageInsight, a medical imaging foundational model, for automated binary classification of chest X-rays into Normal and Abnormal categories. Two approaches were evaluated: (1) fine-tuning MedImageInsight for end-to-end classification, and (2) employing the model as a feature extractor for a transfer learning pipeline using traditional machine learning classifiers. Experiments were conducted using a combination of the ChestX-ray14 dataset and real-world clinical data sourced from partner hospitals. The fine-tuned classifier achieved the highest performance, with an ROC-AUC of 0.888 and superior calibration compared to the transfer learning models, demonstrating performance comparable to established architectures such as CheXNet. These results highlight the effectiveness of foundational medical imaging models in reducing task-specific training requirements while maintaining diagnostic reliability. The system is designed for integration into web-based and hospital PACS workflows to support triage and reduce radiologist burden. Future work will extend the model to multi-label pathology classification to provide preliminary diagnostic interpretation in clinical environments.


Ranking Data with Continuous Labels through Oriented Recursive Partitions

Neural Information Processing Systems

We formulate a supervised learning problem, referred to as continuous ranking, where a continuous real-valued label Y is assigned to an observable r.v. X taking its values in a feature space X and the goal is to order all possible observations x in X by means of a scoring function s: X R so that s(X) and Y tend to increase or decrease together with highest probability. This problem generalizes bi/multi-partite ranking to a certain extent and the task of finding optimal scoring functions s( x) can be naturally cast as optimization of a dedicated functional criterion, called the IROC curve here, or as maximization of the Kendall τ related to the pair (s(X),Y). From the theoretical side, we describe the optimal elements of this problem and provide statistical guarantees for empirical Kendall τ maximization under appropriate conditions for the class of scoring function candidates. We also propose a recursive statistical learning algorithm tailored to empirical IROC curve optimization and producing a piecewise constant scoring function that is fully described by an oriented binary tree. Preliminary numerical experiments highlight the difference in nature between regression and continuous ranking and provide strong empirical evidence of the performance of empirical optimizers of the criteria proposed.



SURFing to the Fundamental Limit of Jet Tagging

Pang, Ian, Faroughy, Darius A., Shih, David, Das, Ranit, Kasieczka, Gregor

arXiv.org Artificial Intelligence

Jet tagging is a central task in collider physics. Over the past decade, machine learning has driven major advances in jet tagging, with increasingly sophisticated architectures achieving very high classification performance on simulated datasets [1-11]. This success naturally raises a key question: have current jet taggers already reached the fundamental limit of jet tagging, or does a gap remain between practical performance and the true statistical optimum? The Neyman-Pearson (NP) limit, defined by the likelihood ratio, is the best possible discriminant between two different underlying physics processes - such as top and QCD jets - that any classifier could achieve if it had access to the exact data likelihoods [12]. In practice, however, this limit cannot be evaluated directly because the true likelihood of the data-generating process is unknown. It therefore remains unclear how close existing classifiers are to this ultimate bound. Recently, Ref. [13] proposed using autoregressive GPT-style generative models to probe this limit for top vs. QCD jets from the JetClass dataset [14]. These models operate on discretized, tokenized representations of jet constituents and yield explicit log-likelihoods, enabling the computation of likelihood ratios between jet classes.



A Non-robust Model Training

Neural Information Processing Systems

The model confidence distributions are shown in Figure 10 and Figure 11. Each row contains the same model adversarially and standard trained. In contrast, the robust models are better calibrated. Each row contains the same model adversarially and standard trained. In contrast, the robust models are better calibrated.